Universidad de Buenos Aires - Map-Time Surfer

VAST 2011 Challenge
Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

Andrés Parra, Universidad de Buenos Aires, parra.andres@gmail.com

Tool(s):

The main tool used for visualization of the data is Map-Time Surfer, a program developed in Processing (http://processing.org) specifically for the contest (http://parramining.blogspot.com/2011/06/map-time-surfer.html). Other programs used were AWK (http://cm.bell-labs.com/cm/cs/awkbook/index.html), R (http://www.r-project.org/), Tableau (http://www.tableausoftware.com/), EditPad Lite (http://www.editpadlite.com/) and Microsoft Excel.

 

Video:

 

The Camtasia Studio video content presented here requires JavaScript to be enabled and the latest version of the Adobe Flash Player. If you are using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Adobe Flash Player by downloading here.

Transcription of the narration (my English is not as good as I'd wish.)

 

ANSWERS:


MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

Based on the analysis of the different events reflected in the messages, their location and time of occurrence, I believe the specific event which triggers this outbreak is the truck accident over the bridge of route 610 happened on May 17th around 11 am (Fig. 1). A few hours later we see the symptoms of illness in the direction in which the wind could carry the infectious agent (Fig. 2), and after that we see other kind of symptoms down the river.

Fig. 1: Truck accident of May 17th, 11 am. (dots in yellow)

Fig. 2: First infections, forming a cone with vertex on the location of the truck accident, and growing in the direction of the wind. (dots in red)


MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning.

My hypothesis is that the infection is being transmitted airborne and waterborne, but not person-to-person.

This is based in the location and time of the events depicted in the previous answer, in the symptoms informed afterwards down the river (Fig. 3), and in the lack of symptoms reported by assistants to crowded events like conventions.

Color reference in the figures:
Black: Messages of a person not yet infected.
Red: First message with a symptom.
Blue: Following messages of a person already infected.
Yellow: Messages about the truck accident of May 17th.

Fig. 3: Gastrointestinal symptoms near the river.

The outbreak seems to be contained, since there are very few new cases.

Since the few new cases are located mostly near the river, it could be advisable to deploy treatment resources in the neighbor populations located south, following the Vast River.

PROCESS USED TO ARRIVE AT THE ANSWER

First the messages were analyzed with a simple program in AWK that gave the frequency of each word. Words relevant to flu and death were looked upon. After a while it was clear that there were synthetic messages mixed with random messages not relevant. The attention was put in the synthetic messages, and the events depicted by them:
- Symptoms of flu and breathing problems
- Symptoms of gastroinstestinal illness
- Reports of sick acquaintances
- Truck accidents
- Plane accident
- Car accidents
- Bomb threats
- Explosions in Smog Town
- Etc…
This process took about 20 hs (all time of research and development are gross estimates, since no log was kept), and was mostly manual.

A program in Processing was developed to transpose the different messages of the same person into a single record. (4 hs.)

It was added an indicator of the first symptom message. Reports of sick acquaintances were removed.

A program in Processing was developed to show the path of those showing symptoms of illness, to look for patterns that could lead to find the origin of the outbreak. (16 hs.)

The paths of the ill people seem random previous at their infection. (Fig. 4)

Fig. 4: The growing width of the line indicates time progress. The red dot identifies the first message with symptoms, the blue line shows the path after infected.

It was expected to find something significant in the first infected people, but the outbreak starts affecting a lot of people at the same time.

A program in Processing was developed to capture in screenshots the positions of the messages of infected people, by hour, indicating by color if the person is not yet infected (black dot), reports its first symptom (red), or is a subsequent message of an already infected person (blue). A clear pattern emerged (see answer MC1.1), but the reason of the origin of the outbreak was missing. (12 hs.)

A program in Processing was developed to show these screenshots in a 3D proyection (24 hs). By visual exploration three clusters were found of newly infected in the downtown (Fig. 5). With the facility provided by the program (now in Map-Time Surfer used to see the path of the transmitters of specific messages), a filter of time and space was obtained and two groups were put together: about 3000 infected cases in these clusters and the cases that never show symptoms.

Fig. 5: Clusters investigated.

A lot of time was expended looking for a criterion that separates these two groups. Some things tried, to no avail:
- Decision trees
- Neural networks
- Association rules

As additional variables there were incorporated the time of the day when sending messages, the distance between messages and the velocity needed to be at those points in time and space. Some curious values emerged, but nothing useful.

It also was thought that if there was a healthy carrier of the decease and this was transmitted person-to-person, could be interesting to see if the infected ones were near the same person at any time. With a program in Processing these coincidences in time and space were generated and investigated with the igraph library in R (Fig. 6). A tree-form graph was expected, but not found.

Fig. 6: Graph with the people with most coincidences in time and space.

With a program in AWK there were obtained the sequence of four words and their frequency, to see if some of the synthetic messages could separate the groups. It was found that virtually all the senders of synthetic messages not related to symptoms of his or her own illness do not become infected afterwards. But as predictors of future infection only appeared “set off the fire alarm” and “set the fire alarm off”, which were discarded after further inspection. (Fig. 7)

Fig. 7: Evaluation of phrases as predictors of infection. (sicon = number of persons that at some point after sending this message become infected; nocon = not infected)

All these attempts took easily about 100 hs.

As a detail, it was added an indicator of the direction of the wind, and then it was seen that an event at west of the center of the city, and airborne, could have started the outbreak. So it was recalled the event of the truck accident of May 17th over the bridge of route 601, where a truck spills its cargo. Being over the river, if the agent is also waterborne this could explain the infections down the river of May 19th and after.

(Relating this problem with mini-challenges 2 and 3, it was guessed that the truck belongs to AFC and was exploded by a terrorist who knew its route and content due to a previous computer network intrusion.)

The Processing programs showing the messages and showing the paths were integrated into the one I now present as Map-Time Surfer and the messages about the accident were incorporated, as a final visualization of the situation. (12 hs.)

Andrés Parra, June 28, 2011.